Home Projects JIIT Placement Alerts Data Processing & Content Extraction Statistics Calculation & Analytics

Statistics Calculation & Analytics

Referenced Files

placement_stats_calculator_service.py placement_service.py database_service.py db_client.py config.py placement_offers.json structured_job_listings.json README.md ARCHITECTURE.md DATABASE.md

Introduction#

This document provides comprehensive documentation for the statistics calculation and analytics service powering placement analytics at JIIT. It explains how placement statistics are computed, including company-wise offer counts, package distribution analysis, student selection trends, and placement rate calculations. It also covers data aggregation for historical tracking, trend analysis, and statistical reporting, along with calculation methods for average packages, highest/lowest offers, branch-wise placement statistics, and time-series analysis. Integration with the database for data retrieval, caching strategies for performance optimization, and real-time statistics updates are documented. Examples of calculated metrics, visualization data formats, export capabilities, and data privacy considerations are included.

Project Structure#

The statistics service is implemented as a modular Python service integrated into the broader notification bot ecosystem. Key components include:

Placement statistics calculator service for computing branch-wise, company-wise, and overall metrics
Database service for retrieving and persisting placement offers and statistics
Email processing pipeline that extracts and structures placement offers for analytics
Configuration and client modules for environment management and database connectivity

graph TB subgraph "Data Sources" Email["Placement Emails"] Portal["SuperSet Portal"] Official["Official Website"] end subgraph "Processing Layer" PS["PlacementService"] DB["DatabaseService"] Calc["PlacementStatsCalculatorService"] end subgraph "Analytics Outputs" Metrics["Placement Metrics"] CSV["CSV Export"] Trends["Historical Trends"] end Email --> PS Portal --> DB Official --> DB PS --> DB DB --> Calc Calc --> Metrics Calc --> CSV Calc --> Trends

Diagram sources

Section sources

Core Components#

This section outlines the core components responsible for statistics computation and analytics.

PlacementStatsCalculatorService
- Computes overall statistics, branch-wise metrics, and company-wise aggregations
- Filters students by company, role, location, package range, and search query
- Provides CSV export functionality for administrative reporting
- Calculates average, median, and highest packages per unique student
- Computes placement percentages by branch and overall
DatabaseService
- Retrieves placement offers from MongoDB for analytics
- Provides raw statistics computation for official placement data
- Manages collections for notices, jobs, placement offers, users, and official placement data
PlacementService
- Extracts and structures placement offers from emails using LLM orchestration
- Sanitizes privacy-sensitive information
- Emits events for new offers and updates for real-time updates
DBClient and Configuration
- Manages MongoDB connection and collection access
- Centralized configuration via Pydantic Settings with environment variables

Section sources

Architecture Overview#

The statistics architecture integrates data ingestion, processing, persistence, and analytics computation across services and databases.

sequenceDiagram participant Email as "Email Source" participant PS as "PlacementService" participant DB as "DatabaseService" participant Calc as "PlacementStatsCalculatorService" Email->>PS : Raw email content PS->>PS : Classify + Extract + Validate + Sanitize PS->>DB : Save placement offers DB-->>PS : Save result + events PS-->>Email : Mark as read (after success) Note over PS,DB : Offers persisted to PlacementOffers collection Calc->>DB : Retrieve offers (get_all_offers) DB-->>Calc : Placement offers Calc->>Calc : Compute branch/company/overall stats Calc-->>Calc : Filtered stats (optional) Calc-->>Calc : CSV export data

Diagram sources

Detailed Component Analysis#

PlacementStatsCalculatorService#

This service encapsulates all statistics computation logic, including:

Branch resolution from enrollment numbers using predefined ranges
Package calculation prioritizing student-specific packages, matching role packages, single-role packages, and maximum among multiple roles
Unique student counting per branch and per placement offer
Average, median, and highest package computations using the highest package per unique student
Branch-wise placement percentages based on eligible student counts
Company-wise statistics including student counts, role profiles, and average packages
Filtering pipeline supporting company, role, location, package range, and search query
CSV export generation for administrative reporting

classDiagram class PlacementStatsCalculatorService { +__init__(db_service, enrollment_ranges, student_counts) +calculate_all_stats(placements) PlacementStats +calculate_filtered_stats(placements, companies, roles, locations, package_range, search_query) PlacementStats +export_to_csv_data(placements, filtered, **filter_kwargs) List[List[str]] -_flatten_students(placements) List[Dict] -_filter_students(students, exclude_branches, companies, roles, locations, package_range, search_query) List[Dict] -_calculate_package_stats(students) Tuple[List[float], float, float, float] -_calculate_branch_stats(students) Dict[str, BranchStats] -_calculate_company_stats(students) Dict[str, CompanyStats] -_get_branch_total_counts() Dict[str, int] -_extract_filter_options(placements) Dict[str, List[str]] -_get_branch(enrollment) str } class BranchStats { +branch : str +total_offers : int +unique_students : int +total_students_in_branch : int +packages : List[float] +avg_package : float +highest_package : float +median_package : float +placement_percentage : float } class CompanyStats { +company : str +students_count : int +profiles : Set[str] +packages : List[float] +avg_package : float } class PlacementStats { +unique_students_placed : int +total_offers : int +unique_companies : int +average_package : float +median_package : float +highest_package : float +placement_percentage : float +total_eligible_students : int +branch_stats : Dict[str, Dict] +company_stats : Dict[str, Dict] +available_filters : Dict[str, List[str]] } PlacementStatsCalculatorService --> BranchStats : "creates" PlacementStatsCalculatorService --> CompanyStats : "creates" PlacementStatsCalculatorService --> PlacementStats : "returns"

Diagram sources

Section sources

Database Integration and Data Retrieval#

The DatabaseService provides:

Retrieval of placement offers for analytics via get_all_offers
Raw statistics computation for official placement data
Upsert logic for merging offers and emitting events for new and updated offers
Collection access for notices, jobs, placement offers, users, and official placement data

flowchart TD Start(["Start"]) --> GetOffers["Call get_all_offers(limit)"] GetOffers --> Offers["Retrieve placement offers"] Offers --> Calc["PlacementStatsCalculatorService.calculate_all_stats"] Calc --> Stats["PlacementStats result"] Stats --> End(["End"])

Diagram sources

Section sources

Email Processing and Privacy Sanitization#

The PlacementService orchestrates intelligent classification, extraction, validation, and privacy sanitization of placement emails. It ensures only final placement offers are processed and sanitized to protect privacy.

sequenceDiagram participant Email as "Email" participant PS as "PlacementService" participant DB as "DatabaseService" Email->>PS : Email content PS->>PS : Classify (keywords + confidence) alt Is placement offer PS->>PS : Extract + Validate + Sanitize PS->>DB : Save placement offers DB-->>PS : Events (new/update) PS-->>Email : Mark as read else Not placement offer PS-->>Email : Mark as read (skip) end

Diagram sources

placement_service.py

Section sources

placement_service.py

Data Models and Storage#

The system stores placement offers and related data in MongoDB collections. The PlacementOffers collection holds extracted offers with company, roles, students, and metadata. The OfficialPlacementData collection stores aggregated statistics snapshots.

erDiagram PLACEMENT_OFFERS { ObjectId _id PK string company array roles array students_selected date created_at date updated_at } USERS { ObjectId _id PK string user_id boolean subscription_active date registered_at date last_active } NOTICES { ObjectId _id PK string id string title string content string source string category boolean sent_to_telegram date created_at date updated_at } JOBS { ObjectId _id PK string job_id string company string job_title date posted_at date updated_at } OFFICIAL_PLACEMENT_DATA { ObjectId _id PK string data_id date timestamp json overall_statistics json branch_wise array company_wise json sector_wise date created_at date updated_at } PLACEMENT_OFFERS ||--o{ USERS : "referenced by notifications" PLACEMENT_OFFERS ||--o{ NOTICES : "related to" JOBS ||--o{ NOTICES : "used for"

Diagram sources

DATABASE.md

Section sources

DATABASE.md

Dependency Analysis#

The statistics service depends on:

DatabaseService for data retrieval and persistence
PlacementService for extracting and structuring placement offers
Configuration and DBClient for environment and database connectivity

graph TB Calc["PlacementStatsCalculatorService"] --> DB["DatabaseService"] Calc --> Config["Settings (config.py)"] DB --> Client["DBClient"] Client --> Mongo["MongoDB"] PS["PlacementService"] --> DB

Diagram sources

Section sources

Performance Considerations#

Caching: Settings are cached via lru_cache to avoid repeated environment loading
Batch operations: DatabaseService uses bulk upserts and aggregation for statistics
Indexing: Recommended indexes on frequently queried fields (e.g., company, processing_status, timestamps)
Connection pooling: PyMongo manages connection pooling automatically
Real-time updates: Events emitted on new/updated offers trigger immediate recalculations

[No sources needed since this section provides general guidance]

Troubleshooting Guide#

Common issues and resolutions:

Database connection failures: Verify MONGO_CONNECTION_STR and IP whitelist for MongoDB Atlas
Empty statistics: Ensure placement offers exist in PlacementOffers collection and are not filtered out by exclusions
Privacy concerns: PlacementService sanitizes email headers and forwarded information before saving
Export issues: Confirm CSV export filters and ensure placements are available

Section sources

Conclusion#

The statistics calculation and analytics service provides a robust, modular framework for computing placement metrics, enabling branch-wise and company-wise analysis, filtering, and export capabilities. Through integration with the database and email processing pipeline, it supports real-time updates and historical trend analysis while maintaining privacy and performance best practices.

[No sources needed since this section summarizes without analyzing specific files]

Appendices#

Calculation Methods and Metrics#

Average package: Sum of highest packages per unique student divided by count
Median package: Middle value of sorted packages for unique students
Highest package: Maximum package among unique students
Placement percentage: Unique placed students in tracked branches divided by total eligible students, multiplied by 100
Company-wise average package: Sum of packages per company divided by number of packages
Branch-wise statistics: Total offers, unique students, total eligible students, average/median/highest packages, and placement percentage

Section sources

Visualization Data Formats#

Branch-wise metrics: Dictionary with branch names as keys and computed statistics as values
Company-wise metrics: Dictionary with company names as keys and counts, profiles, and average packages
Available filters: Lists of companies, roles, and locations derived from placements

Section sources

Export Capabilities#

CSV export: Generates rows with student name, enrollment number, company, role, package, job location, and joining date
Filtered exports: Applies filters before generating CSV rows

Section sources

placement_stats_calculator_service.py

Data Privacy and Compliance#

Privacy sanitization: Removal of email headers, forwarded markers, and sender information during extraction
Data retention: Consider TTL indexes for auto-cleanup of logs and temporary data
Institutional policies: Ensure compliance with educational institution data policies and student privacy regulations

Section sources

Previous LLM Powered Content Extraction

Next Server Components

JIIT Placement Alerts

Architecture & Design

Core Services

Data Processing & Content Extraction

Server Components

Statistics Calculation & Analytics

Table of Contents#

Introduction#

Project Structure#

Core Components#

Architecture Overview#

Detailed Component Analysis#

PlacementStatsCalculatorService#

Database Integration and Data Retrieval#

Email Processing and Privacy Sanitization#

Data Models and Storage#

Dependency Analysis#

Performance Considerations#

Troubleshooting Guide#

Conclusion#

Appendices#

Calculation Methods and Metrics#

Visualization Data Formats#

Export Capabilities#

Data Privacy and Compliance#